Mitigating Economic Losses Through Flood Predictions: Analyzing the Impact of Affected Populations on Economic Damages
Author
SID: 540305506 | University of Sydney | DATA1001 | October 2024
1. Client Bio and Recommendation
1.1 Client: IFRC - Flood Resilience Program
Bio: IFRC - Flood Resilience Program focuses on improving flood resilience in vulnerable communities across Asia, Africa, and other regions. Their initiatives include disaster preparedness, community education, and policy recommendations to mitigate flood impacts.
1.2 Recommendation
The main goal of the program is to reduce the number of people impacted by floods and minimize economic losses. This report demonstrates a moderate correlation between the number of affected people and economic damages, indicating that reducing the number of people affected by floods not only protects human lives but also helps reduce economic losses, especially in Asia. While many other factors can influence economic damage from floods, reducing the number of affected people remains a key factor. Through appropriate measures such as early warning systems, floodplain zoning, and the protection of vulnerable communities, the economic damage from future natural disasters can be minimized if the impact of floods on people is mitigated effectively.
2. Evidence
Code
# Load necessary librarieslibrary(tidyverse)# Set options to avoid scientific notation in plotsoptions(scipen =999)# 1. Load Data# The dataset 'natural-disasters.csv' contains information on natural disasters, including floods.# Select the relevant columns for analysis.raw_data <-read.csv("natural-disasters.csv")# 2. Data Wrangling# Extract the necessary columns related to floods and economic damagedata <- raw_data[, c("Entity", "Year", "Number.of.people.affected.by.floods", "Total.economic.damages.from.floods")]# Clean the data by omitting rows with missing values# This step ensures that the dataset is ready for analysis and free from NA valuesclean_data <-na.omit(data)
To demonstrate evidence-based decision-making, we analyse data related to the correlation between the number of people affected by floods and total economic damages.
2.1 Relationship between Number of People Affected and Total Economic Damages
Code
# Compute the correlation between the two variablescorrelation =cor(clean_data$Number.of.people.affected.by.floods, clean_data$Total.economic.damages.from.floods)# Rounding the value to 2dpround(correlation, 2)
[1] 0.74
Code
# Loading necessary librarieslibrary(ggplot2)library(plotly)# Produce a regression linep =ggplot(clean_data, aes(x = Number.of.people.affected.by.floods , y =Total.economic.damages.from.floods)) +geom_point(color ="red") +# Setting the points to red for scatscatterplotgeom_smooth(method ="lm", color ="blue", se =FALSE) +labs(title ="Figure 2a. Relationship Between Number of People Affected and Economic Damages",x ="Number of people affected",y ="Total economic damages (USD)") +scale_x_continuous(labels = scales::comma) +# Format x-axis with commasscale_y_continuous(labels = scales::comma) +# Format y-axis with commastheme_minimal()# Make it interactivel =ggplotly(p)l
Correlation: 0.74 Linear equation: y = 0.1565255x + 85843.659821
The correlation coefficient of 0.74 indicates a moderate linear relationship between the two variables. The scatterplot with the regression line shows as the number of people affected by floods increased, total economic damages from floods also increased. For instance, for each additional person affected by floods, economic damages are predicted to rise by approximately $0.1565 USD.
Code
# Fit the linear modelmodel =lm(Number.of.people.affected.by.floods ~ Total.economic.damages.from.floods, data = clean_data)# Loading necessary librarieslibrary(ggplot2)library(plotly)# Create residual plotp =ggplot(model, aes(x = .fitted, y = .resid)) +# Plotting fitted values and residualsgeom_point(color ="red") +# Setting the points to red for scatterplotgeom_hline(yintercept =0, linetype ="dashed", colour ="blue") +# Horizontal line at 0labs(title ="Figure 2b. Residual plot of Number of People Affected and Economic Damages",x ="Fitted values",y ="Residual values") +scale_x_continuous(labels=scales::comma) +#Format x-axis with commasscale_y_continuous(labels = scales::comma) #Format y-axis with commas#Make the plot interactivel =ggplotly(p)l
However, the uneven distribution around the regression line and the presence of outliers suggest that this model does not sufficiently explain all the factors affecting total economic damages.
2.2 Hypothesis Testing
A hypothesis-testing framework is used to examine the relationship between the Number of people affected and Total economic damages in order to strengthen the evidence of a linear correlation between the two variables.
Null Hypothesis (H0): There was no significant linear relationship between the number of people affected by floods and economic damages. The slope coefficient is equal to zero (𝛽 = 0). Alternative Hypothesis (H1): There was a significant positive linear relationship between the number of people affected by floods and economic damages. The slope of coefficient is greater than zero (𝛽 > 0).
p-value <0.0000000000000002 < α-value(=0.05)
Therefore, H0 is rejected, indicating the positive linear relationship between the number of people affected by floods and total economic damages was highly significant.
2.3 Continent-Specific Analysis
Code
# Loading necessary librarieslibrary(DT) # For rendering interactive tables# Grouping and summarizing data by continent# This step filters data for the specified continents and sums the number of people affected by floods.continent_summary <- clean_data %>%filter(Entity %in%c("Asia", "Africa", "Europe", "North America", "South America", "Oceania")) %>%# Filter for relevant continentsgroup_by(Entity) %>%# Group data by continentsummarise(Total_Affected =sum(Number.of.people.affected.by.floods, na.rm =TRUE)) # Summing the number of affected people, handling missing values with na.rm = TRUE# Rendering the table# datatable() is used to display the summarized data in an interactive, paginated table.datatable( continent_summary, # Data to displayoptions =list(pageLength =5, autoWidth =TRUE), # Display options: set page length to 5 and enable auto width adjustmentcaption ='Table: Total number of people affected by floods across continents'# Caption for the table)
According to the table, Asia was the most heavily impacted continent from 1900-2010, with the highest number of affected people. From there, it shows that large, densely populated areas, especially in coastal zones and low-lying river plains, are highly vulnerable to flood risks[1]. The concentration of populations in these flood-prone regions increases their susceptibility to significant flooding impacts.
2.4 Case Study: Asia (1960–2000)
Code
# Loading necessary libraries for data visualization and manipulationlibrary(ggplot2) # For creating visualizationslibrary(dplyr) # For data manipulation using the tidyverselibrary(plotly) # For interactive plot# Filtering data for Asia between the years 1960, 1970, 1980, 1990, and 2000# This selects relevant data for specific years and for Asia only.asia_data <- clean_data %>%filter(Entity =="Asia", Year %in%c(1960, 1970, 1980, 1990, 2000)) # Filter the data by continent and year# Creating a plot to visualize the relationship between total economic damages and the number of people affected by floodsp =ggplot(asia_data, aes(x = Year)) +# Plotting Year on the x-axisgeom_line(aes(y =`Total.economic.damages.from.floods`, color ="Total Economic Damages (USD)"), size =1.2, linetype ="solid") +# Plot solid line for economic damagesgeom_line(aes(y =`Number.of.people.affected.by.floods`, color ="Number of People Affected"), size =1.2, linetype ="dashed") +# Plot dashed line for number of people affectedgeom_point(aes(y =`Total.economic.damages.from.floods`, color ="Total Economic Damages (USD)"), size =3) +# Adding points for claritygeom_point(aes(y =`Number.of.people.affected.by.floods`, color ="Number of People Affected"), size =3, shape =4) +# Add points for affected peoplelabs(title ="Figure 2c. Number of Affected People and Total Economic Damages in Asia (1960-2000)",x ="Year", # Label for the x-axisy ="", # Label for the y-axiscolor ="") +theme_minimal() +# Using a minimal theme for a clean aestheticscale_y_continuous(labels = scales::comma) +# Formatting y-axis with commas for large numbersscale_color_manual(values =c("Total Economic Damages (USD)"="blue", "Number of People Affected"="red"))#Make the plot interactivel =ggplotly(p)l
Asia should be prioritized for further analysis because of its highest number of affected individuals. The graph shows a clear pattern: as the number of people affected by floods decreased, total economic damages also tended to reduce, especially visible from the peak in 1990 to 2000. This highlights that strategies like flood predictions and timely warnings could significantly lower economic losses by reducing the number of people affected.[2]
2.5 Data Limitations
The raw data came from the past, covering the years 1960-2010, so the analysis may not accurately reflect the current situation.
Missing values: Incomplete surveys from some countries pose challenges for accurate estimations.
The IFRC Flood Resilience Program aims to improve resilience in vulnerable communities, aligning with the report’s insights on how floods impact populations and economic damages to support mitigation strategies.
3.2 Statistical Analysis
3.2.1 Linear Modelling
Code
# Rounding the value to 2dpround(correlation, 2)
[1] 0.74
Despite non-random distribution of residual plot (Figure 2.b), the linear model remains useful, showing a correlation of 0.74 between the two variables.
3.2.2 Hypothesis Testing
Hypothesis:
H0: No significant linear relationship exists.
H1: A significant linear relationship exists.
Assumptions:
Independence: Data from different countries/years suggests independence.
Normality: : Not satisfied, as Q-Q plot shows deviation, especially for quantiles ≥ 2 or ≤ 2.
Code
# Load necessary librarieslibrary(ggplot2)# Fit the linear modelmodel <-lm(Total.economic.damages.from.floods ~ Number.of.people.affected.by.floods, data = data)# Extract residuals from the modelresiduals <-residuals(model)# Create a data frame with residuals for plottingresiduals_df <-data.frame(residuals = residuals)# Generate a pretty Q-Q plot using ggplot2ggplot(residuals_df, aes(sample = residuals)) +stat_qq(color ="red", size =2) +# Points are dark blue and slightly largerstat_qq_line(color ="blue", linetype ="dashed", size =1) +# Reference line is dashed and redlabs(title ="Q-Q Plot of Residuals", x ="Theoretical Quantiles", y ="Sample Quantiles") +theme_minimal(base_size =15) +# Use a minimal theme with larger base font sizetheme(plot.title =element_text(hjust =0.5, face ="bold"), # Center and bold the titleaxis.title =element_text(face ="bold"), # Bold the axis titlespanel.grid =element_line(size =0.5, color ="gray80") # Light gray grid lines )
Homoscedasicity: Residual plot is not homoscedastic .
Code
# Create residual plotggplot(model, aes(x = .fitted, y = .resid)) +# Plotting fitted values and residualsgeom_point(color ="red") +# Setting the points to red for scatterplotgeom_hline(yintercept =0, linetype ="dashed", colour ="blue") +# Horizontal line at 0labs(title =" Residual plot for Number of People Affected and Economic Damages",x ="Fitted values",y ="Residual values") +scale_x_continuous(labels=scales::comma) +#Format x-axis with commasscale_y_continuous(labels = scales::comma) #Format y-axis with commas
Linearity: The scatterplot looks linear.
Code
# Produce a regression lineggplot(clean_data, aes(x = Number.of.people.affected.by.floods , y =Total.economic.damages.from.floods)) +geom_point()+# Setting the points to red for scatterplotgeom_smooth(method ="lm", se =FALSE) +labs(title =" Scatterplot with regression line for Number of People Affected and Economic Damages",x ="Number of people affected",y ="Total economic damages (USD)") +scale_x_continuous(labels = scales::comma) +# Format x-axis with commasscale_y_continuous(labels = scales::comma) +# Format y-axis with commastheme_minimal()
Code
# Obtain a summary of the modelsummary(model)
Call:
lm(formula = Total.economic.damages.from.floods ~ Number.of.people.affected.by.floods,
data = data)
Residuals:
Min 1Q Median 3Q Max
-7054942 -86063 -85844 -85122 28348291
Coefficients:
Estimate Std. Error t value
(Intercept) 85843.659821 40265.168385 2.132
Number.of.people.affected.by.floods 0.156255 0.004199 37.211
Pr(>|t|)
(Intercept) 0.0332 *
Number.of.people.affected.by.floods <0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1337000 on 1122 degrees of freedom
(480 observations deleted due to missingness)
Multiple R-squared: 0.5524, Adjusted R-squared: 0.552
F-statistic: 1385 on 1 and 1122 DF, p-value: < 0.00000000000000022
Therefore, there was a significant positve linear trend between the two variables. However, 2/4 assumptions are violated, so other factors may affect the results and make it invalid.
3.2.3 Line chart
A line chart showing the variation of the two variables in Asia helps the client easily observe their relationship accross many years, highlighting the significant decline in both variables after peaking in 1990.
3.3 Limitations
Missing values
The data was outdated.
The linear model didn’t account for factors like reconstruction costs and business disruptions affecting economic damages.
The statistical results presented are based on rigorous analysis of the available data, free from external influence, and the methodologies used are clearly explained to ensure reproducibility. The report have made a conscientious effort to present data honestly, highlighting strengths and acknowledging limitations of the analysis, such as outdated data or external factors on the linear relationship.
4.2 Ethics Principles: Pursuing Objectivity
This report adheres to the principle of Pursuing Objectivity by selecting methods that ensure accurate and timely results, particularly focusing on minimizing economic losses due to floods. The data used has been carefully curated, with missing values removed to enhance the reliability of the analysis. All findings, including correlation coefficients and regression models, are presented